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Abstract 

Gamma-band rhythmic inhibition is a ubiquitous phenomenon in neural circuits yet its computational 
role still remains elusive. We show that a model of Gamma-band rhythmic inhibition allows networks 
of coupled cortical circuit motifs to search for network configurations that best reconcile external inputs 
with an internal consistency model encoded in the network connectivity. We show that Hebbian plasticity 
allows the networks to learn the consistency model by example. The search dynamics driven by rhythmic 
inhibition enable the described networks to solve difficult constraint satisfaction problems without making 
assumptions about the form of stochastic fluctuations in the network. We show that the search dynamics 
are well approximated by a stochastic sampling process. We use the described networks to reproduce 
perceptual multi-stability phenomena with switching times that are a good match to experimental data 
and show that they provide a general neural framework which can be used to model other ’perceptual 
inference’ phenomena. 


Introduction 

Gamma oscillations are rhythmic patterns of neural activity in the 30-80 Hz frequency range that have 
been measured in the extracellular fields of multiple brain areas across many species [1]. They have been 
associated with attention [2] , working memory [3] , and the execution of cognitive tasks [4]. Local rhythmic 
inhibition is a fundamental feature of Gamma oscillations [5,6] that acts to modulate the firing rate of 
the local circuit as well as its sensitivity to external input [7]. Although many studies have elucidated 
several aspects of the biophysical mechanisms underlying Gamma-band rhythmic inhibition [8,9], its 
computational and functional role is still a matter of debate [1,8]. 

In this paper, we investigate the functional role of local Gamma-band rhythmic inhibition in rate- 
based networks of neurons configured as coupled Winner-Take-All (WTA) circuits. The WTA circuits 
are used as models of local cortical circuit motifs [10]. The strength of the effective connectivity between 
a pair of coupled WTAs is continuously modulated by the phase difference between their local rhythms. 
Higher coherence between these rhythms leads to more reliable communication between the WTA pair 
in line with the “communication through coherence” hypothesis [11, 12]. On a global level, we show 
that oscillatory inhibition drastically alters the dynamics of coupled WTA networks and allows them 
to search for activity states that best satisfy the constraints encoded in the network connectivity while 
being consistent with the externally applied inputs. We show that rhythmic inhibition effectively allows 
neural networks to solve constraint satisfaction problems which are among the most difficult classes of 
computational problems. This result is particularly relevant to neural models of sensory processing. A 
long theoretical tradition casts sensory processing as being a process of inferring the maximally consistent 
interpretations of imperfect sensory input where consistency is judged according to an internal model of 
the environment constructed from prior experience [13-15]. There are two fundamental questions that 
arise when considering the neural substrate underlying the search for consistent interpretations: (1) how 
to learn and represent the consistency model; (2) how to use the consistency model to obtain plausible 
interpretations of ambiguous inputs. The networks we describe address these two questions within a 
biologically realistic framework. 

Several neural architectures have been proposed whose dynamics execute a search for configurations 
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that maximally satisfy a set of constraints. One such architecture uses Hopfield networks that are 
engineered so that their energy functions have the deepest minimum at the maximally consistent state [16, 
17] which thus becomes a stable point of the network dynamics. In general, however, there are other 
non-optimal network states that are also dynamically stable states. There is no guarantee that the 
network will not get stuck in these sub-optimal states [17]. Moreover, in the case of ambiguous input, 
the network does not explore all the best, equally consistent states or input interpretations. A network 
model for solving constraint satisfaction problems using multi-stable WTA-based oscillators has been 
described in [18]. In this model as well, there is no guarantee that the network will not get stuck at 
locally optimal solutions and the model does not address how the internal consistency model can be 
learned. Alternative architectures sidestep the issue of getting stuck at locally optimal states or input 
interpretations by making use of stochastic networks that continuously explore the state space [19,20]. 
Such networks can be configured to visit more consistent states with higher probability. However, for 
an unambiguous input, such networks can not search for, and stabilize at, the fully consistent state, but 
would only transiently visit this optimum state [21]. 

We show that the issues inherent in these architectures can be avoided by using rhythmic inhibition 
to drive the search for maximally consistent input interpretations. The networks we describe never get 
stuck at non-optimal input interpretations; if a fully consistent interpretation of an unambiguous input 
exists, the network will find and stabilize at that interpretation; if the input is ambiguous or the set 
of constraints irreconcilable, the network will continuously switch between the most consistent states or 
input interpretations. In the latter case, we show that the network trajectory can be well approximated 
by a stochastic Markov chain Monte Carlo (MCMC) sampler which is remarkable considering that the 
network trajectory is deterministic. We use this continuous switching between equally consistent in¬ 
terpretations to model perceptual multi-stability phenomena. Unlike the majority of previous models 
though, we start with a ’naive’ network and allow it to learn by example, through Hebbian plasticity in 
the network connections, what constitutes consistent inputs. The multi-stability phenomenon emerges 
when the network receives ambiguous input that does not admit a consistent interpretation according 
to the previously seen examples. The network architecture we describe can be used as a biologically 
motivated neural substrate on which to ground various “perception as inference” theories. 


Results 

Modeling assumptions 

The networks we describe in this paper are composed of a number of local neural circuits that interact 
through long range excitatory connections and that each receive local oscillatory inhibition. The prop¬ 
erties of the oscillatory inhibition model the salient features of the rhythmic inhibition that underlies 
Gamma oscillations. The following are biological justifications for the various modeling assumptions we 
use: 

1. WTA circuits are local neural circuit motifs: WTA circuits are potential cortical circuit motifs [10]. 
The WTA circuits can be replaced by any neural circuit as long as this circuit displays a number 
of distinct firing patterns based on external input and a memory of past firing patterns. Some 
distinct part of each firing pattern should be characterized by a high enough firing rate so that it 
can influence the firing pattern in another neural circuit when the two are coupled. 

2. Oscillatory inhibition is local to each WTA: Gamma oscillations typically have a local origin [1,5,22]. 
Gamma oscillations in local field potential typically arise from the rhythmic firing of basket cells 
that have predominantly local arborization [23]. It is a general phenomenon that faster rhythms 
tend to develop locally [6]. 
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3. Local oscillatory inhibition is strong enough to shut down the activity in the local circuit: There is 
strong evidence for the involvement of interneurons containing the calcium binding protein Parval- 
bumin in the oscillatory discharge underlying Gamma oscillations [7,24]. Interneurons expressing 
Parvalbumin, such as basket cells and chandelier cells, mainly target the soma, the axon initial 
segment, and the proximal dendrites of the excitatory principal cells [25] making them particularly 
effective in rhythmically silencing their target neurons. 

4. The local oscillatory inhibition waveforms have different frequencies: There is evidence that Gamma 
oscillations recorded from even nearby regions in the same cortical area have significantly different 
frequencies [26]. Phase relations between Gamma oscillations recorded from nearby points in the 
cortex are continuously changing [2,23], and the phases of local cortical Gamma rhythms vary in 
an irregular manner [27]. Different oscillation frequencies is one simple way to obtain continuously 
changing phase relations. The different local rhythms can be highly coherent and we investigate 
the effect of this coherence on the model behavior. The only requirement of our model is that the 
space of possible phase relations is continuously explored with no perfect and persistent phase lock 
between any of the local rhythms. 

Network description 

We simulate and analyze rate-based population-level networks. Each neural population in a WTA is 
modeled as a linear threshold unit (LTU) following a mean-field approach [28-30] (see Methods section). 
All excitatory connections have a slow and fast component to capture the effects of the different time 
constants of the synaptic currents mediated by slow NMDA receptors and fast AMPA receptors. The 
network architecture and example simulation results are shown in Fig. 1. When a WTA is released from 
oscillatory inhibition, the excitatory population receiving the largest external input wins and suppresses 
the activity in the other excitatory populations in the WTA. The A-mediated recurrent synaptic 

current has a time constant that is longer than the oscillatory inhibition period. Since this current is 
always larger in the winning population, it is able to bias the competition so that the winning population 
keeps on winning in subsequent inhibition cycles after the external input is removed as shown in Fig. Ic. 

Figure Id shows two coupled WTA circuits in which the coupling connections encode a consistency 
condition. Each WTA circuit represents a discrete variable with two states: A and B and the coupling 
connections dictate that when population A is winning in one WTA circuit then population B should 
be winning in the other. The activity of one WTA circuit, however, is able to influence the state of the 
other WTA only when its interval of heightened activity coincides with the winner selection interval (the 
interval just after oscillatory inhibition goes low) in the other WTA circuit. This effect is illustrated 
in Fig. le where we choose slightly different frequencies for the oscillatory inhibition in the two WTA 
circuits. In the supplementary material, we describe how arbitrary consistency conditions involving more 
than two WTAs can be realized. 

This example shows how the effective strength of the inter-WTA coupling connections depends on the 
phase difference between their rhythmic inhibition cycles. This is compatible with the hypothesis that 
the effective strength of the fixed anatomical connections in the brain is modulated by the dynamical 
state of the communicating neural circuits which allows the quick formation and removal of effective 
“communication” links [31,32]. The phase difference between the rhythms of two neural groups is a 
potential marker for the strength of the effective link between them [2,11]. This view is particularly 
appealing in the case of gamma oscillations due its rhythmic modulation of excitability and firing rate [12]. 

Properties of the network trajectory and the search for consistent states 

A WTA circuit can be involved in a number of different consistency conditions. As the rhythmic inhibition 
wears off in one WTA circuit, other WTA circuits that are coupled to it can influence its state so as to 


4 






Excitatory population 

Inhibitory population 

Excitatory connection (AMPA + NMDA) 
Inhibitory connection 
Oscillatory inhibition 



Local circuit is insensitive to external input 

Local circuit integrates external input 

Local circuit manifests a particular firing pattern 


(a) 






Figure 1. (a) Generic structure of a local circuit that undergoes oscillatory inhibition. Oscillatory 
inhibition is modeled as a periodic rectangular function, (b) A binary WTA circuit modulated by 
oscillatory inhibition, (c) Simulation results of the binary WTA. Oscillatory inhibition periodically 
shuts down all activity. External input (colored horizontal bars) can bias the winner selection process 
and in its absence, the identity of the winning excitatory population is maintained across the inhibition 
cycles, (d) Coupling two WTA circuits , VO and VI, to enforce the consistency condition VOt^VI. (e) 
Simulation results of the network in (d) when the frequency of the oscillatory inhibition is slightly 
different in the two WTA circuits. Vertical lines are visual guides to the end of the high phase of 
oscillatory inhibition in the VO WTA circuit. 





























































































5 


satisfy the consistency conditions encoded in the pattern of coupling connections (see Fig. Id for example). 
Exactly which consistency condition, if any, gets satisfied by the new state/winner of the WTA circuit 
depends on the level of activity in the WTA circuits connected to it, which in turn depends on the phase 
of the inhibition cycle in these WTA circuits. Since the phase relations between the WTA circuits are 
continuously changing, the relative strengths of the different consistency conditions are also continuously 
changing. 

If we consider each WTA as a discrete-valued variable whose value is the identity of the last winning 
population, we can enumerate all possible configurations of a network of interacting oscillatory WTA 
circuits; for example, a network of N coupled WTA circuits with two excitatory populations each can 
have 2^ possible configurations. Define d{t) as the discrete-valued, continuous-time, dynamical variable 
that denotes the configuration of the network at time t. This variable is updated each time the local 
oscillatory inhibition shuts down the activity in a WTA circuit to reflect the identity of the WTA circuit’s 
last winning population at that time. We can prove the following two results about the trajectory of d{t): 

proposition 1. The only fixed points ofd{t) are the configurations that satisfy all consistency conditions 

proposition 2. d{t) is not periodic as long as it has not reached a fixed point 

The proofs are given in the supplementary material. The network thus traverses the space of possible 
WTA configurations in an irregular, aperiodic manner until it finds a fully consistent configuration. Ex¬ 
ternal inputs can clamp the states of some WTAs, i.e, force particular populations to win the competition 
in each inhibition cycle. In this case, the rest of the network would search for configurations that satisfy 
the consistency conditions, given the states of the input-clamped WTAs. We illustrate this behavior using 
a hard 9*9 Sudoku instance (with one unique solution) that is shown in Fig. 2a. The network embodying 
this problem uses a WTA circuit with 9 excitatory populations to represent each square. The index of the 
winning population in each WTA encodes the number in the underlying square. The Sudoku constraints 
are implemented by coupling each square/WTA to every other square/WTA in the same row, column, 
or 3*3 block with a pair-wise connection scheme that encodes an inequality constraint: each excitatory 
population in one square/WTA is connected to all the excitatory populations in the other square/WTA 
except the excitatory population having the same index (Fig. Id shows an inequality constraint in the 
binary WTA case). Each WTA will then try to force all the other WTAs in the same row, column, and 
3*3 block to encode a different number. External input forces every WTA that represents a pre-filled 
square to always encode the pre-filled number. For example, the top left WTA/square receives external 
input so that population 8 always wins in each inhibition cycle. 

Figures 2c, 2d show that the network quickly yields reasonably good solutions that only violate a 
few constraints, and on average tends to occupy better configurations the longer it is allowed to run. 
The network is able to escape from locally optimal configurations and in all trials, it manages to find 
and stabilize at the solution to the Sudoku problem. Fig. 2b shows the convergence times when using 
oscillatory inhibition rhythms with an average frequency of 50 Hz. 

Irreconcilable consistency conditions and the sampling analogy 

According to proposition 1, if the consistency conditions encoded in the coupling connections can not all be 
simultaneously satisfied, the network will never settle in one configuration. This is the case for the network 
shown in Fig. 3. Figure 3b shows that the network tends to spend more time in configurations that 
violate fewer consistency conditions. Figure 3c shows the relative duration of time the network spends in 
each configuration. The plot in Fig. 3c can be interpreted as the plot of a probability distribution over the 
possible configurations of the network. We take this stochastic interpretation further by approximating 
the network dynamics by a Markov chain Monte Carlo (MCMC) sampling process. Based on the network 
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Figure 2. (a) A 9*9 hard Sudoku instance. Each square is represented by one WTA with 9 excitatory 
populations, which receives oscillatory inhibition, (b) Histogram of the convergence time to the unique 
solution in 100 trials starting from different initial conditions, (c) Mean and standard deviation of the 
number of inequality constraints violated by the Sudoku network at each time point across the 100 
trials, (d) A zoom-in of the first 0.5 s in c. 
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Figure 3. (a) Topology of a network with irreconcilable consistency conditions. Each square 
corresponds to a binary WTA. (b) Normalized duration of time spent by the network in each of the 64 
configurations as a function of the number of violated consistency conditions. Each cross denotes one 
configuration, (c) Normalized durations of the 64 network configurations obtained by simulating the 
actual network for 1 x 10® seconds and the relative frequencies of occurrence of these configurations in a 
sequence of 10® samples generated by a stochastic MCMC operator constructed to approximate the 
network trajectory. The configuration index is the decimal value of the binary string encoding the 
states of EO to V5. (d) Trajectory of the dynamical variable d{t) for the network in a for the first 5 
seconds (on the left), with the normalized duration of time spent in the different configurations during 
those 5 seconds (on the right, blue bars). As the simulation progresses, the normalized time spent in the 
different configurations approaches the normalized duration results of c (reproduced on the right, in 
light red). 






































topology, we construct a stochastic Markovian transition operator T that approximates the network 
trajectory (see Methods section). 

We compare the normalized duration of time the trajectory of d{t) spends in a particular configuration 
to the normalized frequency of occurrence of this particular configuration in the sequence of samples 
generated by the stochastic transition operator T. The results are shown in Fig. 3c for the example 
network. The two sets of statistics are similar even though they were generated by two widely different 
classes of systems: one that is deterministic, continuous-valued and running in continuous time, and the 
other stochastic, discrete-valued and running in discrete time steps. The sampling analogy is reinforced 
by the aperiodicity of d(t) which yields an irregular trajectory of the network state that is akin to a truly 
stochastic trajectory as shown in Fig. 3d. The difference between two probability distribution p and q 
can be quantified using Kullback-Leibler(KL) divergence: 

KL{p,q) = = x,)loa 

For the network shown in Fig. 3, KL{pnf,twork,PMCMc) = 9-0* lQ~'^nat. In the supplementary material, 
we show that the stochastic approximation is still good for various network sizes and inter-WTA connec¬ 
tion densities (see Supplementary Fig. SI) and analyze the properties of the MCMC operator as well as 
discuss in more detail the assumptions underlying the stochastic approximation. 

Of course, given the statistics of the network trajectory, it is trivial to construct an MCMC operator 
that has these statistics as a stationary distribution using any of the standard sampling methods such as 
Gibbs sampling. The MCMC operator described in this section, however, was constructed a-priori, i.e, 
before simulating the network, as an approximation of the network trajectory. 

Effect of noise and non-zero coherence 

The deterministic networks we present exploit the continuously shifting phase relations between different 
WTAs in order to explore the space of possible network configurations. We added white Gaussian noise 
to the input of all LTUs and random fluctuations to the phases of all inhibitory rhythms (see Methods 
section). Figure 4a shows that noise ‘smears out’ the distribution of relative durations by biasing the 
trajectory towards low-duration states and away from high-duration states. This smearing out effect is 
more pronounced for larger random fluctuations. This is reminiscent of how higher noise/temperature 
in a thermodynamical system acts to increase the system entropy by making the probability distribution 
over the system states more uniform. There is a crucial difference between these networks and thermody¬ 
namical systems, however: At the zero noise (zero temperature) level, a thermodynamical system quickly 
freezes at a single state while our networks will continue to explore the configuration space. 

Gamma rhythms in distant neural assemblies often exhibit non-zero but still imperfect (non-unity) 
coherence [11]. We model this phenomenon by coupling the phases of the inhibitory rhythms in different 
WTAs according to the Kuramoto model [33] in order to realize phase relations that are consistently 
near zero. This phase coupling was introduced between variables/WTAs VO and V3 in the network in 
Fig. 3a and all network components were perturbed by random fluctuations. Figure 4d shows that V0-V3 
coherence increases the duration of all configurations in which the V0-V3 consistency condition is satisfied 
and decreases the duration of all configurations in which it is not. Increased coherence between two WTAs 
thus leads to stronger interaction and makes it more difficult to violate the consistency conditions between 
them. 

Learning the consistency conditions 

Like any connectionist architecture, the knowledge in the networks we describe is encoded in the inter- 
WTA connections. These connections represent the consistency conditions which, through the network 
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Figure 4. (a) Statistics of the trajectory of the network in Fig. 3a in the non-noisy, low noise, and high 
noise cases, (b) Mean square coherence between the oscillatory inhibition in variables VO and V3 in the 
high noise case when their phases are coupled and when they are uncoupled, (c) statistics in the high 
noise case when VO and V3 are phase coupled and when they are uncoupled, (d) Change in the relative 
(normalized) duration of each network configuration when V0-V3 become phase coupled. Data was 
obtained from c. Blue bars indicate configurations in which the V0-V3 consistency condition is satisfied 
while red bars indicate configurations in which the consistency condition is violated. 
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dynamics, induce a “probability distribution” over the network configurations. If made plastic, the 
strengths of these connections will be continuously modulated by the changing network state. By forcing 
a particular configuration on the network, external input can thus reorganize the pattern of these inter- 
WTA connections so as to maximize the consistency of the input-imposed configurations as interpreted 
according to the pattern of inter-WTA connections. A central question in such a learning scheme is how 
the plasticity rule can distinguish between configurations that are input-imposed, and thus should be 
learned, and configurations that arise naturally when the network is running freely. This central question 
also arises in stochastic, sampling-based connectionist architectures that learn probability distributions 
by example such as Restricted Boltzmann Machines [34]. 

To distinguish an input-imposed configuration as a configuration that should be learned, i.e, that 
should serve as a model for consistent configurations, we have external input synchronize the activity 
of the WTA circuits it targets so that the oscillatory inhibition in these WTA circuits has a common 
frequency and a zero phase difference. Synchronization is thus a dynamical marker for configurations that 
should be learned. We use a bistable Hebbian plasticity rule that acts on the weights of the inter-WTA 
coupling connections. A drift term pushes the synaptic weight to one of two stable values: high or low 
(see Methods section). 

In a rate based network, we are unable to capture the way gamma-band synchronization enhances 
plasticity by forcing neurons in multiple WTA circuits to spike within narrow time windows [35]. However, 
the plasticity rule we use results in the same functional effect. In order for a depressed connection to 
potentiate, both pre- and postsynaptic rates need to be large at the same time for many consecutive 
inhibition cycles in order to overcome the bistability drift. This will only reliably happen if the oscillatory 
inhibition in the pre- and post-synaptic WTA circuits have a measure of synchrony where their phase 
difference is around zero for many cycles so that the peaks of excitatory activity in the WTA circuits 
coincide. 

Learning in a model of perceptual multi-stability 

In this section, we describe a network that is able to learn a consistency model by example and then 
reproduces a perceptual multi-stability phenomenon when faced with ambiguous inputs. The particular 
form of perceptual multi-stability we consider is binocular rivalry which has been well studied experi¬ 
mentally [36] as well as in theory [37]. In binocular rivalry experiments, each eye is shown a differently 
oriented grating. The subject’s perception then continuously switches between the two orientations [36]. 

Network Model The network is composed of rihid = 6 hidden and = 6 input WTA circuits that 
each undergo oscillatory inhibition. Each WTA has six competing excitatory populations, tic = 6, and 
can take one of six different states (orientations), ric = 6 implies that each excitatory population codes 
for a range of orientations of = 30°. 

The full network connectivity is illustrated in Fig. 5. Input and hidden WTA circuits are set up in 
pairs. Each input WTA connects to one hidden WTA so that populations of similar orientations are 
bidirectionally coupled. The connectivity between hidden WTA circuits is all-to-all and initialized using 
random weights: G [wmin,Wmax] that are plastic and follow a bistable Hebbian plasticity rule(see 

Methods section for more details). 

Training and Testing The simulations we run have two distinct phases, a training phase in which 
we cycle through consistent inputs by clamping the input WTAs to the same orientation, and a testing 
phase in which we clamp a subset (or all) of the input WTAs at various orientation patterns and decode 
the activity of the hidden WTA circuits. During the two phases we use the same synaptic rule and 
parameters] learning is effectively enabled/disabled by providing/withholding a synchronizing oscillatory 
input to the inhibitory oscillators of all WTAs (see Methods for further details). 
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Figure 5. Network architecture for the perceptual multi-stability task. Hidden WTAs are inside the 
grey circle while input WTAs are outside. All explicitly drawn connections are fixed connections. The 
six hidden WTA circuits have all-to-all plastic connectivity, i.e, each excitatory population in one 
hidden WTA connects to each excitatory population in the five other hidden WTA circuits. Local 
oscillatory inhibition in each WTA and the recurrent excitatory connections are not shown. 
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Figure 6. The “probability distribution” of the orientation vector decoded from the hidden WTAs in 
the network in Fig. 5 under different conditions. Black dots indicate states that can be encoded by the 
network, and blue empty circles indicate visited states. The area of the filled red circle at each visited 
location is proportional to the time spent at that state (scale differs between plots). The inset Gabor 
patches indicate the stimuli presented to the left and right eye in analogous experimental settings, (a) 
A network trained on consistent inputs propagates the clamped angle of one input WTA to the other 
WTA circuits and stably represents that angle, (b, d, f, h) An untrained network spends most of the 
time in “low confidence states” where the hidden WTA circuits encode different angles in all input 
cases, (c) The trained network switches between two interpretations of an ambiguous input, (e) 
Increasing the number of input WTA circuits encoding 135° to two makes that interpretation more 
probable compared to the 45° interpretation (g) The trained network under fully conflicting input 
preferentially visits consistent states, where all populations represent the same direction. “Low 
confidence” states are also visited. 
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(a) 



(b) 

Figure 7. (a) Readout WTA performing the perceptual decision similar to [38]. (b)The perceptual 
switching times produced by our model fit a gamma distribution that largely agrees with experimental 
data from [36]. 


To visualize the network behavior, we sum the preferred orientation vectors of the winning excitatory 
populations in the hidden WTAs to obtain a vector whose angle represents the current guess of the 
network at the true orientation which it ‘senses’ through the input WTA circuits. The vector’s magnitude 
represents the confidence of the network in this orientation estimate (see Methods section for more details). 
Fig. 6 illustrates the trained and untrained network responses in various input cases: unambiguous input 
in Figs. 6a and 6b (one input WTA clamped at 45° orientation), ambiguous input in Figs. 6c,6d,6e, 
and 6f (one input WTA clamped at 45° and one or two other input WTAs clamped at 135°), and a fully 
conflicting input in Figs. 6g and 6h (each input WTA is clamped at a different angle). 

To be able to make a quantitative comparison to human experiments, we investigated the percep¬ 
tual switching times our model produces when presented with an ambiguous stimulus. We added an 
‘integrative’ component to the system based on [38] in which a bistable decision task for random dot 
stimuli is modelled: A bistable readout attractor network receives time-varying inputs and settles into 
one attractor state. The time varying input is produced by the hidden WTAs and is slowly integrated by 
the readout attractor network. The bistable attractor’s population with the higher activity corresponds 
to the current percept, see Fig. 7. 


Discussion 

Several theories postulate that the key computational machinery of sensory and association cortices is 
devoted to solving best-match problems [39] where a number of interacting areas try to agree on a 
global interpretation that best explains sensory data while being consistent with an internal model, or 
a prior, of the environment [14, 15]. We showed that oscillatory inhibition that gives rise to rhythmic 
modulation of firing rate and sensitivity to external inputs in the local circuitry allows a neural network 
to solve best-match problems. If no fully consistent network configuration exists, the network trajectory 
wanders aperiodically and can be approximated by a stochastic sampling process. We have shown that 
deterministic networks can reproduce effects that were commonly modeled using stochastic networks such 
as perceptual multi-stability and the explaining away effect (see supplementary Fig. S2). Our results show 
that physical noise mechanisms are not strictly necessary for networks to explore complex configuration 
spaces without being trapped by locally optimal configurations. 

Increased Gamma-band coherence between distant neural groups has been found to increase the 
strength of their mutual influence [11,12] making the coherence level of the gamma oscillations a candidate 
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mechanism for quickly modulating the effective strength of anatomical connections (the “communication 
through coherence” hypothesis). In agreement with this effect, we have shown that if oscillatory inhibition 
in the WTA circuits were forced to exhibit some coherence which leads to consistent and favorable phase 
relations (phase differences that are around zero), communication between the coherent WTA circuits 
would be more effective as the consistency conditions inside the coherent set would be more difficult to 
violate(see Fig. 4d). 

We have used synchronization as a way to trigger learning. There is evidence that long-range syn¬ 
chronization is required for effective learning and memory formation [40,41] and that it is a candidate 
mechanism for establishing associations between disparate neural groups [42]. Gamma band synchroniza¬ 
tion in particular is well-suited for establishing associations through STDP mechanisms [35] as neurons 
fire preferentially at a particular phase of the gamma cycle (when the oscillatory inhibition is low), thereby 
aligning their spikes to a precision of a few milliseconds which is a suitable window for inducing synaptic 
potentiation or depression [43]. 

Our perceptual multistability model is, to our knowledge, the first model that reproduces perceptual 
multistability phenomena without making use of explicit stochastic dynamics. An experiment that could 
provide evidence as to whether our model does underlie multi-stable perception would investigate the 
relationship between perceptual switching times and how fast Gamma-band phase differences between 
fields measured at different points on the visual cortex change: faster changes in phase relations translates 
to a more exploratory behavior in our model as the effective network connectivity changes more rapidly 
and should translate to faster switching times between the different percepts. 


Methods 

Network equations 

The basic building block of all networks is the linear threshold unit (LTU). LTUs together with an 
oscillatory inhibitory population make up an oscillatory WTA as shown in Figs, la and lb. The excitatory 
populations in different WTA circuits are coupled together to implement consistency conditions as shown 
in Fig. Id. The full description of a network having M WTA circuits, with excitatory population in 
WTA k is: 




M Nk 


+ - T^) (la) 


k^l 


Ni 


(^) + (^) = {t) + - T^) (lb) 


NMDA K 


:7:Si,(<) + Sz,W = a:„-(t) 


M 


— K,i5[(j)i) -I- ^ Zk^iSin{(j)k{t) - (piit)) 


(l c) 

(l d) 


k^l 

k^i 


H^{t) = 


A if (pi < d ■ 2 tt 
0 otherwise 


(le) 


15 


xfj(t) and xj{t) are the the activities of the excitatory population, and the inhibitory population in 
WTA i respectively. f{x) is the linear threshold function: f{x) = Max{0,x). (j)i is the phase of the 
oscillatory inhibition in WTA i. (j)i automatically wraps around to stay in the range [0, 27r). Hi(t) is 
the actual oscillatory inhibition waveform in WTA i which, within each period, is high (with amplitude 
A) for a fraction d < 1 of the oscillation cycle. xfj{t) is low-pass filtered with time constant to 

yield which is used to model slowly decaying currents mediated by NMD A receptors, r®, and 

T^, are the time constants and thresholds of the excitatory and inhibitory populations respectively. 

and are the inter-WTA AM PA- and A^MZl A-mediated connections respectively 

which connect population xf^ to population xfj. These inter-WTA coupling weights are fixed in all 
simulations except the simulations of the perceptual multi-stability model where they are plastic and 
obey the plasticity rule given in Eq. 3. The remaining weights are the intra WTA weights. rjf (t) 

are informally treated as derivatives of independent, zero mean, unity variance Wiener processes. We 
could thus use the Euler-Maruyama method to carry out the noise simulations where in each time step 
At, the random processes ?7^(t), (t) generate independent Gaussian increments with zero mean and 
variance At. Ki is a zero mean, unity variance Gaussian random variable. The term Ki6{(f>i) in 

Eq. Id indicates that at the beginning of each oscillation cycle, (j)i = 0, a. random increment Ki is 
added to the oscillation phase. This increment is only added once per cycle so the phase has to cross the 
27r point and reset before another increment is added, and are noise scaling factors. Zk^i is 

the phase coupling strength between oscillatory inhibition in WTAs k and i. We always use symmetric 
phase coupling: Zi^j = Zj^i. The phase coupling model follows the Kuramoto model [33]. 

The values of all model parameters used in the simulations are given in the supplementary materials. 

Network equivalent MCMC operator 

In MCMC sampling, a stochastic Markovian transition operator R defined over a space X is used to 
generate a stochastic sequence of points, or samples, in X: Xi,X2, ■ ■ ., where Xn+i is a sample drawn from 
the conditional probability distribution: 

pixn+i = z\x„ = y) = R{y -)■ z) (2) 

R(y —>■ z) is the probability that the next point in the sequence is 0 given that the current point is y. If 
R is irreducible (For any two states, xi and X 2 , X 2 will with non-zero probability appear in the random 
sequence started from xi after a finite number of steps), and R is aperiodic (starting from any state x, 
the indices of the positions in the sequence where x has a non-zero probability of occurring do not have a 
common divisor other than 1) then for any starting point xq, p(xn = y) approaches the quantity q(y) as 
n gets large for all y G X. In other words, the points x^, Xn+i, ■ ■ ■ are samples drawn from the probability 
distribution q{x), called the stationary or invariant distribution of the transition operator i?, for large 
enough n. 

Given a network of interacting WTA circuits, our goal is to find a Markovian stochastic transition 
operator, T, that is defined over the space of all network configurations and which approximates the way 
the network trajectory moves through this configuration space. We here describe how, given a particular 
network configuration the next configuration Xn+i is evaluated. This evaluation procedure implicitly 
defines the transition operator T and it attempts to approximate the way the configuration of the actual 
network changes. 

1. Ghoose at random a WTA V to be updated. This corresponds to a WTA which has just been 
released from oscillatory inhibition and is now selecting a winner. 

2. Choose at random a subset of the consistency conditions that involve the WTA V. Denote this 
subset as C. The WTA circuits involved in this subset are assumed to have the proper phase 
relations to V that enable them to affect the winner selection in V. 
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3. For each possible state Si of WTA V (the number of possible states is the number of excitatory 
populations in the WTA), attach a number rii that counts the number of consistency conditions in 
C that are satisfied if V is in state Si. The states of the other WTA circuits are fixed and encoded 
in Xn- Find the maximum of rii across all states Si and denote it as n. Define Sc = {sfc : = n}. 

Sc is thus the set of possible states for WTA V that are favored by the majority of the consistency 
conditions in C. If the current state of V is in Sc, then the state of V remains unchanged and 
Xn+i = Xn- This reflects the hysteresis mechanism mediated by the slow NMD A current that 
favors the WTA population that won in the previous inhibition cycle. If the current state of V is 
not in Sc, then a new state for V is selected randomly from Sc and Xn+i is updated accordingly. 
Note that at most one WTA changes its state between and Xn+i- 


Plasticity rule for inter-WTA connections 

We use the following bi-stable plasticity rule to modulate the inter-WTA connection strengths: 


where 


dw{t) 

dt 


d{w) 

V+ iTpre, fpost) * [ Tpre (t) - 9] ■ [rpostit) - 9] 
V— (j’pre') ’^post) ' \j' pre (t) - 9] ■ [9 - rpost(t)] 


d{w) 


dup if W{t) > Wrmd 

—ddown otherwise 


V+i'^^pre, 


'^post) — 



if rpre{t) > 9 and rpost{t) > 9 
otherwise 


V— iTpre, ‘^post) 


'ndown if ’^pre {t) > 9 and rpost{t) < 9 
0 otherwise 


( 3 ) 


w{t) is the weight. Xpreit) and rpost(t) are the firing rates of the source (presynaptic) and target 
(postsynaptic) populations respectively. This rule is a variation of the Bienenstock-Cooper-Munro (BCM) 
rule [44] with hard weight bounds Wmin and Wmax (not shown in the equation) and the requirement that 
T'preit) has to exceed a threshold, 9, in order to induce any change in the weight w(t). If this requirement is 
met, potentiation is induced if the postsynaptic activity, rpost{t), is above the threshold 9, and depression 
is induced if the postsynaptic activity is below the threshold. The rates of potentiation and depression 
induction are controlled by rjup and rf^own respectively. The rule captures the way potentiation and 
depression induction depend on the pre- and post-synaptic firing rates [45]. The rule contains a second 
component that slowly forces the connection weight to either Wmin or Wmax depending on whether the 
weight is below or above Wmid = \{wmax + Wmin) respectively. The rule is thus bistable. The strength 
of the bistability drift is controlled by dup and ddown- Bistable plasticity is computationally less powerful 
than its counterpart with continuous stable weights [46], but can be argued to be more biologically 
realistic, due to noise-tolerance and finite synaptic information content. 


Details of the perceptual multi-stability model 

To decode the network activity, we first scale the angles of the normalized preferred directions of the 
excitatory populations in the hidden WTAs from the 0 — 180° range to the 0 — 360° range. We then 
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perform a vector addition of the modified preferred directions of all active populations of the hidden 
WTA circuits to obtain the two quantities: 



r = 




atan2 




( 4 ) 

( 5 ) 


where atan2 is the two argument quadrant adjusted inverse of the tangent, Xi (yt) is the x-coordinate 
(y-coordinate) of the modified preferred direction of population i and Si € {0,1} is the indicator of the 
activity of population i. The summation runs over all excitatory populations in the hidden WTA circuits. 
The factor 4 in Eq. 4 puts the decoded angle back in the 0— 180° range, r, the magnitude of the decoded 
activity vector, can be understood as the confidence of the system in its current angle estimate. When 
all hidden WTA circuits encode a different angle, r takes a value close to zero, when they all encode the 
same angle, it takes the maximal value Uhid = 6. 

“Training” the network refers to the following procedure: For 30 seconds, we provide input that clamps 
the states of all input WTA circuits so that they are all encoding the same orientation, while enabling 
the global synchronizing oscillation. Within these 30 seconds, we cycle through the ric = Q directions 
each WTA can encode. The configuration of the input WTA circuits is thus forced to represent an 
unambiguous orientation which would act to influence the hidden WTA circuits so that they are also 
encoding the same unambiguous orientation. The connections between the hidden WTA circuits change 
to represent the prior that all hidden WTA circuits usually encode the same orientation. 

During testing, we provide inputs that clamp the states of a subset, or of all, input WTA circuits to 
certain orientations for 4 minutes while withholding the global synchronizing oscillation. We record the 
activity of the hidden WTA circuits and decode it into an angle and a magnitude as outlined in Eqs 4 
and 5. We interpret the normalized time histogram of this decoded signal as the probability the network 
assigns to a given r, 0 pair. 
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Proofs of propositions 1 and 2 

The frequencies of the local oscillatory inhibition in the WTA circuits are different and are not rational 
multiples of each other. Having one oscillation frequency that is a rational multiple of another is statisti¬ 
cally impossible in a physical system with real-valued frequencies. Let ^{t) = ■ ■ ■, 0n(O] be 

the vector of the phases of the oscillatory inhibition in the n WTA circuits in the network at time t where 
(j)i G [0, 27r). By virtue of the incommensurability of the oscillatory inhibition frequencies, ^(f) follows 
a quasi-periodic trajectory. For any initial phase vector 4>(0), and any $ G [0, 27r]” and e > 0, there is 
a time t such that |$(t) — $| < e. The vector of phases $(t) thus densely fills up the n-dimensional 
hyper-torus of all possible phases which implies that $(t) cannot have a periodic trajectory. Since it is 
the phase relations between the WTA circuits that determine the effectiveness of the different consistency 
conditions, the ergodicity of the phase vector ^{t) ensures that all possible combinations of the strengths 
of the consistency conditions are eventually explored. In a WTA circuit Wi, the oscillatory inhibition is 
high for hi seconds and low for h seconds during each cycle. In order to prove some useful results about 
the trajectory of d{t), we require that hi > ^ i. This condition, together with the ergodicity of the 

phase vector, guarantees that for any WTA circuit Wr, there is bound to come an oscillation cycle 
where the winner selection is unaffected by the activity in any other WTA, because during that cycle, Wr 
is released from oscillatory inhibition, selects a winner, and gets shut down again by oscillatory inhibition 
within h seconds, while all other WTA circuits are quiescent. 

proposition 1. The only fixed points of d{t) are the eonfigurations that satisfy all consistency conditions 


Proof. Assume d{t) has a fixed point d* which encodes a particular non-changing configuration and that 
this configuration violates one or more consistency conditions. Let Vi and V 2 be two WTA circuits that 
are part of a violated consistency condition. Due to the ergodicity of the phase vector, $(f), there is 
bound to come an oscillation cycle for V 2 which has the following properties: during the part of the cycle 
when oscillatory inhibition is low in V 2 , oscillatory inhibition is high in all other WTA circuits except Vi 
and the peak of activity in the excitatory populations in Vi occurs at the time in which it can dictate 
the winner in V 2 . The influence of Vi on V 2 is thus unopposed by any other WTA and this influence 
acts to enforce the violated consistency condition by changing the identity of the winner in ¥ 2 - Thus, 
a configuration that violates one or more consistency conditions is unstable. Assume d(t) has reached 
a point d which encodes a configuration that satisfies all consistency conditions. The currently winning 
population in each WTA circuit will receive from other WTAs either a stronger or an equal input as 
compared to the other populations in the WTA. In both cases, and due to the hysteresis mechanism 
mediated by the slow recurrent NMD A current, the winning population in each WTA circuit keeps on 
winning and the global configuration d remains unchanged hence d is a fixed point. □ 

proposition 2. d{t) is not periodic as long as it has not reached a fixed point 

Proof. Since d(t) has not reached a fixed point, d{t) alternates between a number of configurations. Let 
U be a WTA in which the identity of the winning population changes between these configurations. 
Assume d{t) has period T. Fix to, at some U where to < ti < to + T, WTA V changes its state, i.e, it 
selects a winning population that is different from the winning population in the previous cycle. This 
change is reflected in d{t) at ti when the oscillatory inhibition goes high in V. V has to change its state 
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Figure SI. The KL-Divergence between the “probability distribution” induced by the network 
trajectory and the stationary distribution of the equivalent MCMC sampler, calculated for 60 randomly 
generated regular graphs with sizes (number of nodes) in the range {6,8,10,12} and node degrees in the 
range {3,4,5}. There are 5 graph instances for each size-degree combination. In (a) the KL-Divergence 
is plotted as a function of the number of nodes in the graph (15 data points per item) and in (b), it is 
plotted as a function of degree (20 data points per item). The red line indicates the median, the box 
outlines the 1st and 3rd quartile and the whiskers show the full range of the data. 


in the same way at -I- nT where n = 1,2,... by the periodicity assumption of d(t). So T has to be a 
multiple of the oscillation period of V. Since the frequencies of oscillatory inhibition in the WTA circuits 
are incommensurable, T is incommensurable with the oscillation periods of all WTA circuits except V. 
Hence, there is bound to come an oscillation cycle for V with the following two properties: during the 
part of the cycle when oscillatory inhibition is low in V, oscillatory inhibition is high in all other WTA 
circuits; and the cycle terminates at ti + kT when the inhibition in V goes high where k is an integer. 
So it is impossible for V to select a winner in this cycle that is different from the winner it selected in 
the previous cycle (in the absence of external influence throughout this cycle, the hysteresis mechanism 
mediated by the slow NMD A recurrent connections yields the same winner as the previous cycle). This 
contradicts the initial assumptions about V (that it selects a different winner at ti -|- kT) and establishes 
the aperiodicity of d{t). □ 

Stochastic approximation of the network trajectory 

Based on the topology of the network, an MCMC operator can be constructed that approximates the 
network trajectory (see Methods section). The stationary probability distribution of this MCMC operator 
does not exactly reproduce the relative durations of the different configurations in the actual network. 
There are many reasons for this discrepancy: 

• The state of a WTA (the identity of the winning population) in one oscillation cycle is not only 
affected by its state in the preceding cycle but also by its state in cycles further in the past. That 
is because the time constant of the slow recurrent NMD A current is 80 ms and the mean period of 
the oscillatory inhibition in the WTA circuits is 22 ms. When a population wins in one inhibition 
cycle, its increased recurrent NMDA current takes many cycles to decay which might affect the 
winner selection process many cycles later. The Markovity property which is fundamental to the 
construction of T is thus not exact. 

• Some WTA circuits (those that have a higher oscillation frequency) update more often than others. 
This violates the assumption inherent in step 1 in the MCMC update scheme which is that WTA 
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circuits update equally often on average. In sampling algorithms like Gibbs sampling, differences 
in the variables update frequencies do not affect the stationary distribution, but in our case it does. 
One can show that this is a consequence of the fact that the transition operator T does not obey 
the detailed balance equation while the transition operator for the Gibbs sampler does. 

• In comparing the statistics of d{t) and those of the sequence generated by T, we are comparing the 
time durations of the different configurations in d{t) to their relative frequencies of occurrence in 
the MCMC sequence. This way, we are in essence assigning an equal time duration to each MCMC 
sample which exposes another assumption of the MCMG scheme, namely, that the WTA circuits 
update their states at equal intervals (the updated state might be the same as the previous one). 
In the actual network, there is no clear-cut update cycle. The WTA circuits might update their 
states in very quick succession so the intermediate configurations might have very short duration, 
or one configuration might persist for a relatively long time before a WTA updates its state (i.e, 
the inhibition goes high in this WTA and the winner is evaluated). 

To further investigate the difference between the network statistics and those of the MCMC operator, 
we constructed random regular graphs of different sizes and node degrees. From each graph we constructed 
a network in the following way: each node in the graph is a binary WTA and each edge represents a 
consistency condition which could either be an equality or inequality condition. We then used Kullback- 
Leibler(KL) divergence to quantify the difference between the “probability distribution” generated by 
the network and the stationary distribution of the equivalent MCMC operator. The results are shown in 
Fig SI for different graph/network sizes and node degrees. The above-mentioned points of discrepancy 
lead to a more pronounced difference in the two sets of statistics in larger networks/graphs. To verify 
that this is not an effect of finite sample size, we generated a larger number of MCMC samples and ran 
the network for longer durations; the KL-divergence between the two sets of statistics remained virtually 
unchanged. The match between the two sets of statistics does not seem to depend on the network/graph 
degree. 

Properties of the network-equivalent MCMC operator 

The transition operator T for an arbitrary network constructed as outlined above is aperiodic as there 
is always a non-zero probability that any configuration persists for an arbitrary number of steps in the 
sequence. Unlike transition operators used in typical MCMC applications, T does not obey detailed 
balance so the generated Markov chain of samples is irreversible. Also, T is not irreducible in general. 
This means that in general, for networks of interacting WTA circuits, the space of possible configurations 
can be decomposed into a number of non-communicating sets and one set of transient configurations. 
Starting the chain from a configuration from the transient set, there is a non-zero probability that this 
configuration will never be visited again. If started in one of the non-communicating sets, the network 
trajectory will always stay in this set of configurations. Hence, the long-term frequency of occurrence, 
or the “probability distribution”, over the network configurations will in general depend on the initial 
configuration of the network. 

The reducibility of T is a desirable property in some circumstances. For example, T is reducible for a 
network where there is a configuration that satisfies all consistency conditions as this configuration persists 
forever. Having fully consistent configurations as absorbing states enables this kind of networks to be 
used for solving constraint satisfaction problems. This behavior is useful in the perceptual multi-stability 
model as it allows the network to form stable percepts if the input is unambiguous. T is also reducible 
for a network where there is a configuration that violates all consistency conditions as this configuration 
will never be visited. This could be beneficial as it reduces the space of configurations explored by the 
network by discarding fully inconsistent states for which there is no evidence, either from external input, 
or from the consistency conditions in the network. More elaborate networks can be constructed that do 
not fall into these two categories but that map to a reducible stochastic transition operator by virtue of 
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having two or more non-communicating sets of recurrent states (i.e non-transient states). External input 
can switch the network configuration from one subset to another. The network can thus explore different 
parts of the configuration space based on external input. The reducibility of T is, however, undesirable 
if the goal is to generate samples from a unique probability distribution irrespective of initial conditions 
which is a typical requirement in many MCMC applications. 

High order consistency conditions and the explaining-away effect 

We consider a general consistency condition, C, that involves m WTA circuits: Vx,... ,Vm, where the 
number of distinct states in WTA i (the number of excitatory populations) is K^. C is simply the set of 
allowed configurations of the m WTA circuits. We require C to be non-empty. C can be implemented in 
the following way: 

1. Define K = YiiZT~^ Introduce a “hub” WTA, H, which has K excitatory populations in its 
WTA. 

2. For each of the K possible configurations of Vi,..., Vm-i, connect each of the excitatory populations 
that are active in this configuration bidirectionally to one distinct excitatory population in H. We 
have thus associated one population in H to each configuration of the WTA circuits Vi,, Idn-i- 

3. Connect the excitatory populations of WTA Vm to the excitatory populations of H bidirectionally 
so that the excitatory populations that are active in each of the allowed (consistent) configurations 
of C connect to an excitatory population in H bidirectionally. 

4. Remove all populations in H that are not bidirectionally connected to a population in Vm. These 
populations encode the configurations of the first to — 1 WTA circuits that are not part of any 
consistent configurations. 

We can show in a way that is analogous to the proofs of propositions 1 and 2 that the described scheme 
will lead to dynamics that keep flipping the states of WTA circuits H,Vi,..., Vm in an aperiodic manner 
as long as Vi,..., Kn are inconsistent according to C. The only stable state is one in which Vi,..., 
satisfy C and the state of H reflects the configuration of WTAs Vi,. ■ ■, I4n-i- 

High-order consistency conditions can capture complex dependencies between the states of multiple 
WTA circuits that can not be captured by pairwise connections between the WTA circuits. One example 
is the dependencies that give rise to the “explaining away” phenomenon which is believed to underlie 
several aspects of visual perception. Explaining away occurs in situations where many causes can give 
rise to a particular outcome. When this particular outcome is observed, the individual likelihood of each 
cause increases as at least one of the causes is needed to explain the outcome. If in addition, one of the 
causes is observed to have taken place, the likelihood of the other causes goes down again. The observed 
cause explains away the other causes as they are no longer needed to justify the outcome. This form 
of inter-causal reasoning establishes a dependency between the different causes of a particular outcome 
when this outcome is observed. Fig. S2a shows a network used to model the dependencies that give rise to 
the explaining away effect. Two causes, causel and cause2, can each give rise to an outcome. The causes 
and the outcome are binary and they are represented by one WTA each. The causes and the outcome 
are coupled to a hub WTA so that the binary outcome is effectively an OR function of the two causes. 
For each cause, we introduce three unitary WTA circuits where two unitary WTA circuits project to the 
0 population and one projects to the 1 population of the cause WTA. That in effect introduces a “prior” 
over the states of the two causes that assigns a higher likelihood to them being 0. Each WTA oscillates 
at a slightly different frequency. The network as a whole can not have a consistent state as each cause 
can not simultaneously satisfy the contradictory requirements of the unidirectional consistency conditions 
coming from the unitary WTA circuits. 
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cause! 00110011 

outcome 00001111 


Normalized duration of causel=l 


0.3283 



to 1 clamped to 1 
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Figure S2. A High-order consistency condition gives rise to the explaining away effect, (a) Network 
topology. The oscillatory inhibition input to each WTA is not shown. The high-order consistency 
condition encoded in the network connectivity requires the outcome to be an OR function of the two 
causes, (b) Normalized duration of occurrence of the 8 possible states of the outcome, causel, and 
cause! WTA circuits under three conditions: network running freely, outcome WTA is clamped to 1, 
and outcome and cause! clamped to 1. In each condition, the network was simulated for 10^ seconds, 
(c) Normalized duration of the configurations in which causel is 1 under the three aforementioned 
conditions. We interpret this as the marginal probability of causel=l under the three conditions. 
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Figure S2b shows the durations of time the network spends in the eight possible configurations of 
the two causes and the outcome. In the free-running network, there are four configurations that satisfy 
the high-order consistency condition (the OR relation). Due to the action of the “prior”, configurations 
that satisfy the high-order consistency condition have a higher “probability” if more of the causes are 
0. The strong “prior” can override the high-order consistency condition and assign a high “probability” 
to a configuration in which the two causes are 0 and which does not satisfy the high-order consistency 
condition. Similar trends can be seen when the outcome WTA is clamped to 1 and when both the outcome 
WTA and the cause2 WTA are clamped to 1. Figure S2c highlights the explaining away effect. In the 
free running network, the normalized duration of time in which causel is equal to 1 is 0.2522. This can be 
interpreted as the “marginal probability” of causel=I. When the outcome is clamped to 1, this marginal 
probability increases as one of the causes is needed to explain the outcome (it can now be interpreted 
as the marginal probability conditioned on outcome=l). When in addition cause2 is clamped to 1, the 
probability of causel=I goes down again as causel is not needed to explain the outcome when cause2 is 
1 . 

Parameter values used in simulations 

Table SI contains the parameter values for the network simulations in Figs.l, 3, and 4 in the main 
text and supplementary Figs. SI and S2. The noise and phase coupling terms are only non-zero in the 
simulations in Fig. 4. Table S2 contains the parameter values used in the Sudoku network in Fig.2. 
Table S3 contains the parameter values used in the simulations of the perceptual multi-stability model 
(Figs. 6 and 7). Parameter values are only shown in tables S2 and S3 if they differ from those in table SI. 
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Oscillatory inhibition 

d 

Pif) 

A 

Wosc 

^osc 

0.6 

Uniform(45,46) FIz 
40 Hz 
-3 
-1 

Fraction of oscillation cycle with active inhibitory output 

Distribution of oscillation frequencies 

Amplitude of oscillatory inhibition 

Weight oscillatory inhibition to excitatory populations 

Weight oscillatory inhibition to inhibitory populations 

Neuron Populations 

^NMDA 

Ti 

^AMPA 

Jnmda 

^rec 

„,,AMPA 

Wei 

„..NMDA 

Wei 

...GABA 

WiE 

...AMPA 

tfMDA 

0.3 ms 
0.2 ms 

80 ms 

8 

-18 

1.2 

0.004 

0.5 

0.3 

-1 

0.06 

0.005 

Excitatory population time constant 

Inhibitory population time constant 

NMD A time constant 

Inhibitory population threshold 

Excitatory population threshold 

Recurrent intra-WTA non-NMDA excitatory weight 

Recurrent intra-WTA NMD A excitatory weight 

Intra-WTA non-NMDA Excitatory to inhibitory weight 

Intra-WTA NMD A Excitatory to inhibitory weight 

Intra-WTA Inhibitory to excitatory weight 

weight of inter-WTA non-NMDA coupling connection from to xfj 

weight of inter-WTA NMD A coupling connection from x^^ to xf.^ 

Noise and phase coupling parameters (only for the simulations in Fig. 4 

^LTU 

^OSC 

200 

50 

0.7 

5 

noise scaling factor in the high noise case 
noise scaling factor in the low noise case 

phase perturbation per cycle for both the low-noise and high-noise case 
Phase coupling strength between V0-V3 


Table SI. Network parameters used for the simulations in Figs. 1, 3, and 4 in the main text and 
supplementary Figs. SI and S2 


Oscillatory inhibition 

d 

0.17 

Fraction of oscillation cycle with active inhibitory output 

pif) 

Uniform(40,60) Hz 

Distribution of oscillation frequencies 

A 

40 Hz 

Amplitude of oscillatory inhibition 

Neuron Populations 


0.5 ms 

Excitatory population time constant 

Ti 

6 

Inhibitory population threshold 

r-j^E 

-2 

Excitatory population threshold 

AMPA 

^rec 

1.8 

Recurrent intra-WTA non-NMDA excitatory weight 

NMDA 

^rec 

0.0001 

Recurrent intra-WTA NMDA excitatory weight 

...AMPA 

Wei 

0.6 

Intra-WTA non-NMDA Excitatory to inhibitory weight 

...GABA 

^lE 

-1.6 

Intra-WTA Inhibitory to excitatory weight 

„,AMPA 

^k,p—¥iA 

0.002 

weight of inter-WTA non-NMDA coupling connection from xf'p to xf^ 

ErMDA 

0.0 

weight of inter-WTA NMDA coupling connection from xf^ to xf^ 


Table S2. Network parameters used for the simulations of the Sudoku network in Fig. 2. Only 
parameter values that are different from table SI are shown. 
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Oscillatory inhibition 

d 

fa 

Pif) 

0.27 
45 Hz 

Uniform(43,53) Hz 

Fraction of oscillation cycle with active inhibitory output 

Oscillation frequency to which oscillators couple, when synchronized 
Distribution of oscillation frequencies 

Neuron Populations 

^NMDA 

Jgaba 

^lE 

0.5 ms 
-2 

0.005 

-1.1 

Excitatory population time constant 

Excitatory population threshold 

Recurrent intra-WTA NMD A excitatory weight 

Intra-WTA Inhibitory to excitatory weight 

Plasticity rule parameters 

^max 

'^min 

e 

Vup 

Vdown 

dup 

ddown 

0.03 

0 

38 Hz 
7.5 X 10-Vs 
3.75 X 10-Vs 
0.002/s 
0.0005/s 

Maximal weight of plastic synapse 

Minimal weight of plastic synapse 

Learning threshold 
potentiation rate 
depression rate 
upward drift rate 
downward drift rate 

Readout WTA (where different from Table SI) 

Win 

^AMPA 

Jnmda 

^rec 

„.,AMPA 

Wei 

0.000216 

1 

0.001 

0.3 

Weight from hidden WTA to Readout WTA 

Recurrent intra-WTA non-NMDA excitatory weight 

Recurrent intra-WTA NMD A excitatory weight 

Intra-WTA non-NMDA Excitatory to inhibitory weight 


Table S3. Network parameters used in the perceptual multi-stability model (Figs. 6 and 7). Only 
parameter values that are different from table SI are shown. 
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